陆地 - 空中双模车辆在学术界和工业中绽放,因为它们融入了空中车辆的高流动性和地面车辆的长期耐力。在这项工作中,我们提出了一种自主和自适应的导航框架,为这类车辆带来完全自主权。该框架主要包括1)分层运动规划器,在未知环境中产生安全和低功率的地面 - 鸟轨迹,2)统一运动控制器,其动态地调整陆地运动中的能量消耗。广泛的现实实验和基准比较是在定制的机器人平台上进行的,以验证所提出的框架的稳健性和性能。在测试期间,机器人安全地穿越了陆地集成流动性的复杂环境,并在地面运动中实现了7美元的节能。最后,我们将为社区的引用发出我们的代码和硬件配置。
translated by 谷歌翻译
时空数据包含丰富的信息,近年来由于许多领域的相关应用程序的快速发展,近年来已广泛研究。例如,医疗机构经常使用与患者不同部位相关的电极来分析具有空间和时间特征富含脑的数据,以进行健康评估和疾病诊断。现有的研究主要使用了深度学习技术,例如卷积神经网络(CNN)或经常性神经网络(RNN)来提取隐藏的时空特征。然而,同时合并相互依存的空间信息和动态时间变化是一项挑战。实际上,对于利用这些时空特征来完成复杂预测任务的模型,它通常需要大量的培训数据才能获得令人满意的模型性能。考虑到上述挑战,我们提出了一个自适应的联合相关性框架,即Fedrel,用于在本文中为时空的图形学习。在将原始时空数据转换为高质量特征之后,框架中的核心动力学间图(DIIG)模块能够使用这些功能来生成能够捕获隐藏拓扑和长期的时空图这些图中的时间相关信息。为了提高模型的概括能力和性能,在保留本地数据隐私的同时,我们还设计了一个相关性驱动的联合学习模块,以利用其模型的细心聚合来利用来自不同参与者的各种数据分布。
translated by 谷歌翻译
近年来,图表表示学习越来越多地引起了越来越长的关注,特别是为了在节点和图表水平上学习对分类和建议任务的低维嵌入。为了能够在现实世界中的大规模图形数据上学习表示,许多研究专注于开发不同的抽样策略,以方便培训过程。这里,我们提出了一种自适应图策略驱动的采样模型(GPS),其中通过自适应相关计算实现了本地邻域中每个节点的影响。具体地,邻居的选择是由自适应策略算法指导的,直接贡献到消息聚合,节点嵌入更新和图级读出步骤。然后,我们从各种角度对图表分类任务进行全面的实验。我们所提出的模型在几个重要的基准测试中优于现有的3%-8%,实现了现实世界数据集的最先进的性能。
translated by 谷歌翻译
相同地形的不同卫星图像的相对辐射归一化(RRN)对于改变检测,对象分类/分割和映射任务是必要的。但是,传统的RRN模型不强大,通过对象变化扰乱,并且RRN模型精确考虑对象变化无法鲁布布地获取无更改集。本文提出了通过潜在变化噪声建模的自动稳健的相对辐射归一化方法。它们利用先验知识,即在相对辐射尺度化下没有变化点具有小尺度噪声,并且在辐射归一化之后,变化点具有大规模的辐射噪声,组合随机期望最大化方法快速且强大地提取No-Change集以学习相对辐射归一化映射映射函数。这使我们的模型在理论上就是关于概率理论和数学扣除的基础。具体地,当我们选择直方图匹配作为与高斯噪声(HM-RRN-RRN-RRN-MOG)混合的相对辐射算法学习方案(HM-RRN-MOG)的相对辐射归一化学习方案,HM-RRN-MOG模型实现了最佳性能。我们的模型具有强大地反对云/雾气/变化的能力。我们的方法自然地为RRN生成一个强大的评估指示器,即No-Change Set Totor Square error。我们将HM-RRN-MOG模型应用于后一种植被/水变化检测任务,这减少了无辐射对比度和NDVI / NDWI对无变化集的差异,产生了一致和可比的结果。我们利用No-Change集合到建筑物变更检测任务中,有效地减少了伪变化并提高了精度。
translated by 谷歌翻译
我们为图形神经网络提供了一个空间的联合学习框架,即STFL。该框架探讨了输入空间 - 时间数据的潜在相关性,并将其转换为节点特征和邻接矩阵。框架中的联合学习设置可确保数据隐私,同时实现了良好的模型泛化。实验结果在睡眠阶段数据集ISRUC_S3上,说明了STFL对图形预测任务的有效性。
translated by 谷歌翻译
虽然有很多关于图像深度学习的硬件加速研究,但在加速涉及图形的深度学习应用时,有一个相当有利的专注。图的独特特性,例如不规则的内存访问和动态并行性,当算法映射到CPU或GPU时,施加有几个挑战。为了在利用所有可用的稀疏性的同时解决这些挑战,我们提出了一种灵活的架构,称为SPA-GCN,用于加速图形卷积网络(GCN),在图中的深度学习算法中的核心计算单元。该架构专门用于处理许多小图形,因为图表尺寸对设计考虑产生了重大影响。在这种情况下,我们使用SIMGNN是一种基于神经网络的图形匹配算法,作为展示我们架构的有效性的案例研究。实验结果表明,与多核CPU实施和GPU实施相比,SPA-GCN可以提供高速度,显示设计效率。
translated by 谷歌翻译
Deepfakes的恶意应用(即,从面部图像产生目标面部属性或整个面部的技术)对个人的声誉和安全构成了巨大的威胁。为了减轻这些威胁,最近的研究已经提出了对抗DeepFake模型的对抗水印,导致它们产生扭曲的输出。尽管结果令人印象深刻,但这些对抗水印具有低的图像水平和模型级可转移性,这意味着它们可以仅保护一个特定的DeepFake模型的一个面部图像。为了解决这些问题,我们提出了一种新的解决方案,可以产生跨模型通用对抗水印(CMUA-Watermark),保护来自多个DeepFake模型的大量面部图像。具体而言,我们首先提出跨模型通用攻击管道,迭代地攻击多个DeepFake模型。然后,我们设计了一种双层扰动融合策略,以减轻不同面部图像和模型产生的对抗水印之间的冲突。此外,我们通过启发式方法解决了跨模型优化的关键问题,以自动找到不同型号的合适的攻击步骤尺寸,进一步削弱了模型级冲突。最后,我们介绍了一种更合理和全面的评估方法来完全测试所提出的方法并将其与现有的方法进行比较。广泛的实验结果表明,所提出的CMUA-Watermark可以有效地扭曲由多个DeepFake模型产生的假面部图像,同时实现比现有方法更好的性能。
translated by 谷歌翻译
A recent study has shown a phenomenon called neural collapse in that the within-class means of features and the classifier weight vectors converge to the vertices of a simplex equiangular tight frame at the terminal phase of training for classification. In this paper, we explore the corresponding structures of the last-layer feature centers and classifiers in semantic segmentation. Based on our empirical and theoretical analysis, we point out that semantic segmentation naturally brings contextual correlation and imbalanced distribution among classes, which breaks the equiangular and maximally separated structure of neural collapse for both feature centers and classifiers. However, such a symmetric structure is beneficial to discrimination for the minor classes. To preserve these advantages, we introduce a regularizer on feature centers to encourage the network to learn features closer to the appealing structure in imbalanced semantic segmentation. Experimental results show that our method can bring significant improvements on both 2D and 3D semantic segmentation benchmarks. Moreover, our method ranks 1st and sets a new record (+6.8% mIoU) on the ScanNet200 test leaderboard. Code will be available at https://github.com/dvlab-research/Imbalanced-Learning.
translated by 谷歌翻译
Weakly-supervised object localization aims to indicate the category as well as the scope of an object in an image given only the image-level labels. Most of the existing works are based on Class Activation Mapping (CAM) and endeavor to enlarge the discriminative area inside the activation map to perceive the whole object, yet ignore the co-occurrence confounder of the object and context (e.g., fish and water), which makes the model inspection hard to distinguish object boundaries. Besides, the use of CAM also brings a dilemma problem that the classification and localization always suffer from a performance gap and can not reach their highest accuracy simultaneously. In this paper, we propose a casual knowledge distillation method, dubbed KD-CI-CAM, to address these two under-explored issues in one go. More specifically, we tackle the co-occurrence context confounder problem via causal intervention (CI), which explores the causalities among image features, contexts, and categories to eliminate the biased object-context entanglement in the class activation maps. Based on the de-biased object feature, we additionally propose a multi-teacher causal distillation framework to balance the absorption of classification knowledge and localization knowledge during model training. Extensive experiments on several benchmarks demonstrate the effectiveness of KD-CI-CAM in learning clear object boundaries from confounding contexts and addressing the dilemma problem between classification and localization performance.
translated by 谷歌翻译
Witnessing the impressive achievements of pre-training techniques on large-scale data in the field of computer vision and natural language processing, we wonder whether this idea could be adapted in a grab-and-go spirit, and mitigate the sample inefficiency problem for visuomotor driving. Given the highly dynamic and variant nature of the input, the visuomotor driving task inherently lacks view and translation invariance, and the visual input contains massive irrelevant information for decision making, resulting in predominant pre-training approaches from general vision less suitable for the autonomous driving task. To this end, we propose PPGeo (Policy Pre-training via Geometric modeling), an intuitive and straightforward fully self-supervised framework curated for the policy pretraining in visuomotor driving. We aim at learning policy representations as a powerful abstraction by modeling 3D geometric scenes on large-scale unlabeled and uncalibrated YouTube driving videos. The proposed PPGeo is performed in two stages to support effective self-supervised training. In the first stage, the geometric modeling framework generates pose and depth predictions simultaneously, with two consecutive frames as input. In the second stage, the visual encoder learns driving policy representation by predicting the future ego-motion and optimizing with the photometric error based on current visual observation only. As such, the pre-trained visual encoder is equipped with rich driving policy related representations and thereby competent for multiple visuomotor driving tasks. Extensive experiments covering a wide span of challenging scenarios have demonstrated the superiority of our proposed approach, where improvements range from 2% to even over 100% with very limited data. Code and models will be available at https://github.com/OpenDriveLab/PPGeo.
translated by 谷歌翻译